Overview

Dataset statistics

Number of variables13
Number of observations517
Missing cells0
Missing cells (%)0.0%
Duplicate rows4
Duplicate rows (%)0.8%
Total size in memory113.2 KiB
Average record size in memory224.2 B

Variable types

NUM11
CAT2

Reproduction

Analysis started2021-03-11 17:24:49.735910
Analysis finished2021-03-11 17:25:25.357374
Versionpandas-profiling v2.6.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml
Dataset has 4 (0.8%) duplicate rows Duplicates
rain has 509 (98.5%) zeros Zeros
area has 247 (47.8%) zeros Zeros

Variables

X
Real number (ℝ≥0)

Distinct count9
Unique (%)1.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.669245648
Minimum1
Maximum9
Zeros0
Zeros (%)0.0%
Memory size4.2 KiB

Quantile statistics

Minimum1
5-th percentile1
Q13
median4
Q37
95-th percentile8
Maximum9
Range8
Interquartile range (IQR)4

Descriptive statistics

Standard deviation2.313777829
Coefficient of variation (CV)0.4955356825
Kurtosis-1.172330846
Mean4.669245648
Median Absolute Deviation (MAD)2.025874615
Skewness0.03624582161
Sum2414
Variance5.353567841
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[1. 4.5 5.5 8.5 9. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
4 91 17.6%
 
6 86 16.6%
 
2 73 14.1%
 
8 61 11.8%
 
7 60 11.6%
 
3 55 10.6%
 
1 48 9.3%
 
5 30 5.8%
 
9 13 2.5%
 
ValueCountFrequency (%) 
1 48 9.3%
 
2 73 14.1%
 
3 55 10.6%
 
4 91 17.6%
 
5 30 5.8%
 
ValueCountFrequency (%) 
9 13 2.5%
 
8 61 11.8%
 
7 60 11.6%
 
6 86 16.6%
 
5 30 5.8%
 

Y
Real number (ℝ≥0)

Distinct count7
Unique (%)1.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.299806576
Minimum2
Maximum9
Zeros0
Zeros (%)0.0%
Memory size4.2 KiB

Quantile statistics

Minimum2
5-th percentile2
Q14
median4
Q35
95-th percentile6
Maximum9
Range7
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.229900403
Coefficient of variation (CV)0.2860362161
Kurtosis1.420553416
Mean4.299806576
Median Absolute Deviation (MAD)0.9487034633
Skewness0.4172962459
Sum2223
Variance1.512655001
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[2. 3.5 4.5 5.5 7. 8.5 9. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
4 203 39.3%
 
5 125 24.2%
 
6 74 14.3%
 
3 64 12.4%
 
2 44 8.5%
 
9 6 1.2%
 
8 1 0.2%
 
ValueCountFrequency (%) 
2 44 8.5%
 
3 64 12.4%
 
4 203 39.3%
 
5 125 24.2%
 
6 74 14.3%
 
ValueCountFrequency (%) 
9 6 1.2%
 
8 1 0.2%
 
6 74 14.3%
 
5 125 24.2%
 
4 203 39.3%
 

month
Categorical

Distinct count12
Unique (%)2.3%
Missing0
Missing (%)0.0%
Memory size4.2 KiB
aug
184
sep
172
mar
54
jul
 
32
feb
 
20
Other values (7)
55
ValueCountFrequency (%) 
aug 184 35.6%
 
sep 172 33.3%
 
mar 54 10.4%
 
jul 32 6.2%
 
feb 20 3.9%
 
jun 17 3.3%
 
oct 15 2.9%
 
dec 9 1.7%
 
apr 9 1.7%
 
jan 2 0.4%
 
Other values (2) 3 0.6%
 

day
Categorical

Distinct count7
Unique (%)1.4%
Missing0
Missing (%)0.0%
Memory size4.2 KiB
sun
95
fri
85
sat
84
mon
74
tue
64
Other values (2)
115
ValueCountFrequency (%) 
sun 95 18.4%
 
fri 85 16.4%
 
sat 84 16.2%
 
mon 74 14.3%
 
tue 64 12.4%
 
thu 61 11.8%
 
wed 54 10.4%
 

FFMC
Real number (ℝ≥0)

Distinct count106
Unique (%)20.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean90.64468085
Minimum18.7
Maximum96.2
Zeros0
Zeros (%)0.0%
Memory size4.2 KiB

Quantile statistics

Minimum18.7
5-th percentile84.1
Q190.2
median91.6
Q392.9
95-th percentile95.1
Maximum96.2
Range77.5
Interquartile range (IQR)2.7

Descriptive statistics

Standard deviation5.520110849
Coefficient of variation (CV)0.06089834282
Kurtosis67.06604054
Mean90.64468085
Median Absolute Deviation (MAD)2.772072925
Skewness-6.575605977
Sum46863.3
Variance30.47162378
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[18.7 77.3 83.95 90.05 91.55 ... 92.05 92.15 92.55 93.8 96.2 ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
91.6 28 5.4%
 
92.1 28 5.4%
 
91 22 4.3%
 
91.7 19 3.7%
 
93.7 16 3.1%
 
92.4 16 3.1%
 
92.5 15 2.9%
 
94.8 14 2.7%
 
90.1 12 2.3%
 
92.9 12 2.3%
 
Other values (96) 335 64.8%
 
ValueCountFrequency (%) 
18.7 1 0.2%
 
50.4 1 0.2%
 
53.4 1 0.2%
 
63.5 2 0.4%
 
68.2 1 0.2%
 
ValueCountFrequency (%) 
96.2 2 0.4%
 
96.1 6 1.2%
 
96 2 0.4%
 
95.9 2 0.4%
 
95.8 1 0.2%
 

DMC
Real number (ℝ≥0)

Distinct count215
Unique (%)41.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean110.8723404
Minimum1.1
Maximum291.3
Zeros0
Zeros (%)0.0%
Memory size4.2 KiB

Quantile statistics

Minimum1.1
5-th percentile14.92
Q168.6
median108.3
Q3142.4
95-th percentile231.1
Maximum291.3
Range290.2
Interquartile range (IQR)73.8

Descriptive statistics

Standard deviation64.04648225
Coefficient of variation (CV)0.5776596941
Kurtosis0.2048217813
Mean110.8723404
Median Absolute Deviation (MAD)49.16270628
Skewness0.5474977945
Sum57321
Variance4101.951889
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 1.1 51.25 52.75 80.8 108.15 108.8 149.8 180.75 182.2 291.3 ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
99 10 1.9%
 
129.5 9 1.7%
 
142.4 8 1.5%
 
231.1 8 1.5%
 
108.4 7 1.4%
 
35.8 7 1.4%
 
137 7 1.4%
 
126.5 7 1.4%
 
108.3 7 1.4%
 
102.3 6 1.2%
 
Other values (205) 441 85.3%
 
ValueCountFrequency (%) 
1.1 1 0.2%
 
2.4 1 0.2%
 
3 2 0.4%
 
3.2 1 0.2%
 
3.6 1 0.2%
 
ValueCountFrequency (%) 
291.3 1 0.2%
 
290 4 0.8%
 
287.2 1 0.2%
 
284.9 1 0.2%
 
276.3 4 0.8%
 

DC
Real number (ℝ≥0)

Distinct count219
Unique (%)42.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean547.9400387
Minimum7.9
Maximum860.6
Zeros0
Zeros (%)0.0%
Memory size4.2 KiB

Quantile statistics

Minimum7.9
5-th percentile43.58
Q1437.7
median664.2
Q3713.9
95-th percentile795.3
Maximum860.6
Range852.7
Interquartile range (IQR)276.2

Descriptive statistics

Standard deviation248.0661917
Coefficient of variation (CV)0.4527250688
Kurtosis-0.245243519
Mean547.9400387
Median Absolute Deviation (MAD)203.6887661
Skewness-1.100445125
Sum283285
Variance61536.83547
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 7.9 105.25 293.55 577.3 664.35 692.05 693.7 766.2 860.6 ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
745.3 10 1.9%
 
692.6 9 1.7%
 
692.3 8 1.5%
 
715.1 8 1.5%
 
698.6 8 1.5%
 
601.4 8 1.5%
 
80.8 7 1.4%
 
686.5 7 1.4%
 
764 7 1.4%
 
647.1 7 1.4%
 
Other values (209) 438 84.7%
 
ValueCountFrequency (%) 
7.9 1 0.2%
 
9.3 1 0.2%
 
15.3 1 0.2%
 
15.5 1 0.2%
 
15.8 1 0.2%
 
ValueCountFrequency (%) 
860.6 1 0.2%
 
855.3 4 0.8%
 
849.3 1 0.2%
 
844 1 0.2%
 
825.1 4 0.8%
 

ISI
Real number (ℝ≥0)

Distinct count119
Unique (%)23.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean9.021663443
Minimum0
Maximum56.1
Zeros1
Zeros (%)0.2%
Memory size4.2 KiB

Quantile statistics

Minimum0
5-th percentile2.6
Q16.5
median8.4
Q310.8
95-th percentile17
Maximum56.1
Range56.1
Interquartile range (IQR)4.3

Descriptive statistics

Standard deviation4.559477175
Coefficient of variation (CV)0.5053920714
Kurtosis21.4580365
Mean9.021663443
Median Absolute Deviation (MAD)3.180718249
Skewness2.536325266
Sum4664.2
Variance20.78883211
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 1.85 6. 6.25 6.35 ... 13.75 14.35 17.95 22.65 56.1 ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
9.6 23 4.4%
 
7.1 21 4.1%
 
6.3 20 3.9%
 
8.4 17 3.3%
 
7 17 3.3%
 
6.2 16 3.1%
 
9.2 15 2.9%
 
7.5 14 2.7%
 
7.8 12 2.3%
 
9 12 2.3%
 
Other values (109) 350 67.7%
 
ValueCountFrequency (%) 
0 1 0.2%
 
0.4 2 0.4%
 
0.7 1 0.2%
 
0.8 3 0.6%
 
1.1 1 0.2%
 
ValueCountFrequency (%) 
56.1 1 0.2%
 
22.7 1 0.2%
 
22.6 1 0.2%
 
21.3 1 0.2%
 
20.3 4 0.8%
 

temp
Real number (ℝ≥0)

Distinct count192
Unique (%)37.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean18.88916828
Minimum2.2
Maximum33.3
Zeros0
Zeros (%)0.0%
Memory size4.2 KiB

Quantile statistics

Minimum2.2
5-th percentile8.2
Q115.5
median19.3
Q322.8
95-th percentile27.9
Maximum33.3
Range31.1
Interquartile range (IQR)7.3

Descriptive statistics

Standard deviation5.80662535
Coefficient of variation (CV)0.3074050304
Kurtosis0.1361655077
Mean18.88916828
Median Absolute Deviation (MAD)4.509991807
Skewness-0.3311722373
Sum9765.7
Variance33.71689795
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 2.2 4.4 5.4 9.95 15.15 24.4 28.8 33.3 ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
19.6 8 1.5%
 
17.4 8 1.5%
 
20.6 7 1.4%
 
15.4 7 1.4%
 
23.4 6 1.2%
 
18.9 6 1.2%
 
16.8 6 1.2%
 
15.9 6 1.2%
 
20.8 6 1.2%
 
20.1 6 1.2%
 
Other values (182) 451 87.2%
 
ValueCountFrequency (%) 
2.2 1 0.2%
 
4.2 1 0.2%
 
4.6 6 1.2%
 
4.8 1 0.2%
 
5.1 5 1.0%
 
ValueCountFrequency (%) 
33.3 1 0.2%
 
33.1 1 0.2%
 
32.6 1 0.2%
 
32.4 2 0.4%
 
32.3 1 0.2%
 

RH
Real number (ℝ≥0)

Distinct count75
Unique (%)14.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean44.28820116
Minimum15
Maximum100
Zeros0
Zeros (%)0.0%
Memory size4.2 KiB

Quantile statistics

Minimum15
5-th percentile24
Q133
median42
Q353
95-th percentile77
Maximum100
Range85
Interquartile range (IQR)20

Descriptive statistics

Standard deviation16.31746924
Coefficient of variation (CV)0.3684382931
Kurtosis0.438182856
Mean44.28820116
Median Absolute Deviation (MAD)12.84294528
Skewness0.8629040079
Sum22897
Variance266.2598024
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 15. 20.5 26.5 27.5 31.5 46.5 59.5 79.5 100. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
27 33 6.4%
 
39 24 4.6%
 
35 20 3.9%
 
42 17 3.3%
 
43 17 3.3%
 
45 16 3.1%
 
34 16 3.1%
 
40 15 2.9%
 
33 15 2.9%
 
46 14 2.7%
 
Other values (65) 330 63.8%
 
ValueCountFrequency (%) 
15 2 0.4%
 
17 1 0.2%
 
18 1 0.2%
 
19 4 0.8%
 
20 1 0.2%
 
ValueCountFrequency (%) 
100 1 0.2%
 
99 1 0.2%
 
97 1 0.2%
 
96 1 0.2%
 
94 1 0.2%
 

wind
Real number (ℝ≥0)

Distinct count21
Unique (%)4.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.017601547
Minimum0.4
Maximum9.4
Zeros0
Zeros (%)0.0%
Memory size4.2 KiB

Quantile statistics

Minimum0.4
5-th percentile1.3
Q12.7
median4
Q34.9
95-th percentile7.6
Maximum9.4
Range9
Interquartile range (IQR)2.2

Descriptive statistics

Standard deviation1.791652601
Coefficient of variation (CV)0.4459507942
Kurtosis0.05432381711
Mean4.017601547
Median Absolute Deviation (MAD)1.437061757
Skewness0.571001127
Sum2077.1
Variance3.210019042
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0.4 1.55 5.6 6.5 9.4 ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
2.2 53 10.3%
 
3.1 53 10.3%
 
4 51 9.9%
 
4.9 48 9.3%
 
2.7 44 8.5%
 
4.5 41 7.9%
 
5.4 41 7.9%
 
3.6 40 7.7%
 
1.8 31 6.0%
 
5.8 24 4.6%
 
Other values (11) 91 17.6%
 
ValueCountFrequency (%) 
0.4 1 0.2%
 
0.9 13 2.5%
 
1.3 14 2.7%
 
1.8 31 6.0%
 
2.2 53 10.3%
 
ValueCountFrequency (%) 
9.4 4 0.8%
 
8.9 1 0.2%
 
8.5 8 1.5%
 
8 5 1.0%
 
7.6 14 2.7%
 

rain
Real number (ℝ≥0)

ZEROS
Distinct count7
Unique (%)1.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.02166344294
Minimum0
Maximum6.4
Zeros509
Zeros (%)98.5%
Memory size4.2 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum6.4
Range6.4
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.2959591209
Coefficient of variation (CV)13.66168442
Kurtosis421.2959636
Mean0.02166344294
Median Absolute Deviation (MAD)0.04265645051
Skewness19.81634398
Sum11.2
Variance0.08759180124
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0. 0.1 1.2 6.4], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 509 98.5%
 
0.8 2 0.4%
 
0.2 2 0.4%
 
0.4 1 0.2%
 
1.4 1 0.2%
 
6.4 1 0.2%
 
1 1 0.2%
 
ValueCountFrequency (%) 
0 509 98.5%
 
0.2 2 0.4%
 
0.4 1 0.2%
 
0.8 2 0.4%
 
1 1 0.2%
 
ValueCountFrequency (%) 
6.4 1 0.2%
 
1.4 1 0.2%
 
1 1 0.2%
 
0.8 2 0.4%
 
0.4 1 0.2%
 

area
Real number (ℝ≥0)

ZEROS
Distinct count251
Unique (%)48.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean12.84729207
Minimum0
Maximum1090.84
Zeros247
Zeros (%)47.8%
Memory size4.2 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0.52
Q36.57
95-th percentile48.714
Maximum1090.84
Range1090.84
Interquartile range (IQR)6.57

Descriptive statistics

Standard deviation63.65581847
Coefficient of variation (CV)4.954804337
Kurtosis194.1407211
Mean12.84729207
Median Absolute Deviation (MAD)18.56683096
Skewness12.84693353
Sum6642.05
Variance4052.063225
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0.00000e+00 4.50000e-02 3.34000e+00 1.14250e+01 3.19650e+01 7.10300e+01 2.45705e+02 1.09084e+03], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 247 47.8%
 
1.94 3 0.6%
 
28.66 2 0.4%
 
0.52 2 0.4%
 
9.96 2 0.4%
 
0.43 2 0.4%
 
1.64 2 0.4%
 
9.27 2 0.4%
 
1.75 2 0.4%
 
1.56 2 0.4%
 
Other values (241) 251 48.5%
 
ValueCountFrequency (%) 
0 247 47.8%
 
0.09 1 0.2%
 
0.17 1 0.2%
 
0.21 1 0.2%
 
0.24 1 0.2%
 
ValueCountFrequency (%) 
1090.84 1 0.2%
 
746.28 1 0.2%
 
278.53 1 0.2%
 
212.88 1 0.2%
 
200.94 1 0.2%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Missing values

Sample

First rows

XYmonthdayFFMCDMCDCISItempRHwindrainarea
075marfri86.226.294.35.18.251.06.70.00.0
174octtue90.635.4669.16.718.033.00.90.00.0
274octsat90.643.7686.96.714.633.01.30.00.0
386marfri91.733.377.59.08.397.04.00.20.0
486marsun89.351.3102.29.611.499.01.80.00.0
586augsun92.385.3488.014.722.229.05.40.00.0
686augmon92.388.9495.68.524.127.03.10.00.0
786augmon91.5145.4608.210.78.086.02.20.00.0
886septue91.0129.5692.67.013.163.05.40.00.0
975sepsat92.588.0698.67.122.840.04.00.00.0

Last rows

XYmonthdayFFMCDMCDCISItempRHwindrainarea
50724augfri91.0166.9752.67.125.941.03.60.00.00
50812augfri91.0166.9752.67.125.941.03.60.00.00
50954augfri91.0166.9752.67.121.171.07.61.42.17
51065augfri91.0166.9752.67.118.262.05.40.00.43
51186augsun81.656.7665.61.927.835.02.70.00.00
51243augsun81.656.7665.61.927.832.02.70.06.44
51324augsun81.656.7665.61.921.971.05.80.054.29
51474augsun81.656.7665.61.921.270.06.70.011.16
51514augsat94.4146.0614.711.325.642.04.00.00.00
51663novtue79.53.0106.71.111.831.04.50.00.00